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Abstract 

Starting from fitness correlation functions, we calculate exact expressions for the amplitude spectra of fitness land- 
scapes as defined by P.F. Stadler [J. Math. Chem. 20, 1 (1996)] for common landscape models, including Kauffman's 
LK-model, rough Mt. Fuji landscapes and general linear superpositions of such landscapes. We further show that 
correlations decaying exponentially with Hamming distance yield exponentially decaying spectra similar to those re- 
ported recently for a model of molecular signal transduction. Finally, we compare our results for the model systems 
to the spectra of various experimentally measured fitness landscapes. We claim that our analytical results should be 
helpful when trying to interpret empirical data and guide the search for improved fitness landscape models. 
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1. Introduction 

In evolutionary processes, populations acquire 
changes to their gene content by mutational or recom- 
binational events during reproduction. If those changes 
improve the adaptation of the organism to its environ- 
ment, individuals carrying the modified genome have a 
better chance to survive and leave offspring in the next 
generation. Through the interplay of repeated mutation 
and selection, the genetic structure of the population 
evolves and beneficial alleles increase in frequency. In 
a constant environment the population may thus end up 
in a well adapted state, where beneficial mutations are 
rare or entirely absent and only combinations of several 
mutations can further increase fitness. 

To describe this kind of process, Sewall Wright in- 
troduced the notion of a fitness landscape Here, the 
genotype is encoded by the coordinates of some suitable 
space and the degree of adaptation or reproductive suc- 
cess is modeled as a real number, called fitness, which 
is identified with the height of the landscape above the 
corresponding genotype. The evolutionary process of 
repeated mutation and selection is thus depicted as a 
hill climbing process. Mutations lead to the exploration 
of new genotypes and selection forces populations to 
move preferentially to genotypes with larger fitness. If 
more than one mutation is necessary to increase fitness, 
the population has reached a local fitness peak. Note 
that some caution is necessary when applying this pic- 
ture, as the way in which genotypes are connected to 



one another does not correspond to the topology of a 
low-dimensional Euclidean space but is more appropri- 
ately described by a graph or network (see below). The 
underlying structure is well known from other areas of 
science, such as spin glasses in statistical physicsJ2, 13] 
and optimization problems in computer science [4]. 

The concept of fitness landscapes has been very fruit- 
ful for the understanding of evolutionary processes. 
While earlier work in this field has been largely theo- 
retical and computational, in recent years an increasing 
amount experimental fitness data for mutational land- 
scapes has become available [5l |a 13, |8|, |9l [10|, 
T&MMM, see Ref.lH for a review. Analysis of 
such data sets provides us with the possibility of a better 
understanding of the biological mechanisms that shape 
fitness landscapes and helps us to build better models. 
Thus, identifying properties of fitness landscapes that 
yield relevant information on evolution is an important 
task. 

One such property that has attracted considerable in- 
terest is epistasis [18]. Epistasis implies that the change 
in fitness that is caused by a specific mutation depends 
on the configurations at other loci, or groups of loci, 
in the genome. In other words, epistasis is the inter- 
action between different loci in their effect on fitness. 
Interactions that only affect the strength of the muta- 
tional effect are referred to as magnitude epistasis, while 
interactions that change a mutation from beneficial to 
deleterious or vice versa are referred to as sign epistasis 
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fl^l . In the absence of sign epistasis, the fitness land- 
scape contains only a single peak and fitness values fall 
off monotonically with distance to that peak. If sign 
epistasis is present, the landscape can present several 
peaks and valleys, which has important implications for 
the mutational accessibility of the different genotypes 
IT7i l20ll2lll an d shortens the path to the next fitness opti- 
mum J22LI23 . 24, 25, 26[ 27fl. Thus the absence of sign 
epistasis implies a smooth landscape, while landscapes 
with sign epistasis are rugged. 

Beyond the question of the presence of epistasis, one 
would like to be able to make more detailed statements 
about how much of it is present or in which way epista- 
sis is realized in the landscape. A very helpful tool to 
answer these kind of questions is the Fourier decompo- 
sition of fitness landscapes introduced in ref. |2H|. This 
decomposition makes use of graph theory to expand the 
landscape into components that correspond to interac- 
tions between loci. The coefficients of the decomposi- 
tion corresponding to interactions between a given num- 
ber of loci can be combined to yield the amplitude spec- 
trum. Calculating amplitude spectra numerically for 
data obtained from models or experiments is straight- 
forward in principle, but so far only a small part of the 
information contained in the spectra is actually used. To 
improve this situation, it is important to understand how 
biologically meaningful features of a fitness landscape 
are reflected in its amplitude spectrum. 

In this article, we take a first step in this direction 
by analytically calculating spectra for some of the most 
popular landscape models: the LK model introduced by 
KauffmarQ[29it33l two versions of the rough Mt. Fuji 
(RMF) model B20l bill , and a generic model with cor- 
relations that decay exponentially with distance on the 
landscape. Thanks to the linearity of the amplitude de- 
composition, linear superpositions of these landscapes 
can also be treated. We calculate the spectra by ex- 
ploiting their connection to fitness correlation functions 
originally established in ref. IT32T1 . Moreover, we com- 
pare some experimentally obtained spectra to the pre- 
dictions of the models to see what features can be ex- 
plained by these models and which can not. In the next 
section we begin by introducing the definitions of fitness 
landscapes and their amplitude spectra on more rigorous 
mathematical grounds. 



2. Fitness landscapes and their amplitude spectra 

2.1. Sequence space and epistasis 

The genotype of an organism is encoded in a 
sequence of letters taken from the alphabet 21 = 
{T,C,G,A} of nucleotide base pairs with cardinality 
|2l| = 4. A similar description applies to the space of 
proteins, where the cardinality of the encoding alphabet 
equals the number of amino acids IB3I1 . Point mutations 
replace single letters by others, altering the sequence 
and therefore the properties of the organism. 

For simplicity, fitness landscapes are often defined 
on sequences comprised by elements of some binary 
alphabet 2l B , where a common choice is 21 B = {0, 1). 
In the present article we prefer the symmetric alphabet 
2l B = {-1,1} for mathematical convenience lL34ll . Note 
that the elements of the binary alphabet do not gener- 
ally stand for bases or encoded proteins but can also in- 
dicate whether a particular (possibly complex) mutation 
is present in a gene or not. Therefore the restriction to 
single changes in the sequence does not imply that the 
treatment is limited to point mutations. 

All possible sequences of a given length L con- 
structed from the binary alphabet 2l B form a metric 
space called the Hamming space H£, also known as the 
Boolean hypercube. Its metric is called the Hamming 
distance, 
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which equals the number of single mutational steps re- 
quired to transform one sequence into the other. To 
quantify the degree of adaptation or reproductive suc- 
cess of an organism carrying the genotype cr, a real 
number F called fitness is assigned to the correspond- 
ing sequence according to 



cr h-> F(cr). 



(2) 



This model is better known as the NK-model. The designation 
in the current article follows refs. ll20ll2lll and is motivated by consis- 
tently using L for the total number of loci. 



To precisely define the different notions of epistasis in- 
troduced above, we consider two sequences cr, cr' e H£ 
with d(cr,cr') < L. Let cr = {cr\, cr,-, <x L ) and 
cr' = (tr'j, . . . , o~u . . . , cr' ), and denote the sequences 
with a mutation at the i-th locus by cr w and cr' w , repec- 
tively, with of' = o-'f = -cr,). If F(cr) - F(o- (i >) + 
F(cr') - F(cr' w ) for some i, the fitness landscape is 
called epistatic. If sgn(F(cr) - F(cr®)) = sgn(F(cr') - 
F(cr' w )) the effect is called magnitude epistasis, while 
for sgn(F(cr) - F(o- (i) )) = -sgn(F(cr') - F(cr'W)) it 
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Figure 1 : llustration of the eigenfunctions 2 i ' 2 0, 1 ( of the graph Laplacian for the binary hypercube with L = 3 and p = 1,2,3. Similar to the 
usual Fourier decomposition on spaces such as Z" or K", eigenfunctions of higher order vary more rapidly. 



is called sign epistasis. Furthermore, the landscape 
is said to contain reciprocal sign epistasis if there are 
pairs of mutations such that -sgn(F(cr) - F(cr , ' ] '~)~) = 
sgn(F(cr) - F(o-W)) = sgn(F(cr) - F(o-W)), with o- (iJ) 
denoting the sequence mutated at loci i and _/ flVD- A 
landscape with sign epistasis is said to be rugged, while 
landscapes containing no epistasis or only magnitude 
epistasis are called smooth. Non-epistatic landscapes 
are also called additive, as here the individual effects 
mutations add up independently. 

The presence of sign epistasis severely limits which 
paths on the landscape are accessible to evolution IT7i[l9[ 
|20fl. Landscapes with reciprocal sign epistasis may con- 
tain several local fitness maxima 112411 . while those that 
do not have a single maximum. The existence of recip- 
rocal sign epistasis is a necessary but not sufficient con- 
dition for the existence of multiple local maxima. For 
an example of a sufficient condition for multiple max- 
ima based on local properties of the landscape see H27I1 . 



2.2. Fourier decomposition 

The adjacency matrix J[ of the Hamming space en- 
codes the neighborhood relations between sequences, 
and is defined as 



(1, d(o-,o J ) = l 
0, else. 



(3) 



With F" denoting the identity of m x m matrices, the 
graph Laplacian A is then defined by A = 3i — LI 2 , and 
its action on the fitness function F yields 

AF(o-) = J] m^F(o-') - LF(o-) 

o-'eUl 

= J] F(cr')-LF(o-). (4) 

d(cr,a-')=\ 



For 21 = 2t B = {-1,1} and <x, denoting the i-th element 
of <r, the eigenfunctions of A are given by <f>i u ...j (<f) = 

2~?o~i 1 ... o~i with p e { 1 , . . . , L) and < i\ < 1% ■ ■ ■ < 
i p < L. The corresponding eigenvalues are A p = ~2p 
and thus the degeneracy is rj. The set of all eigenfunc- 
tions ipiicr) forms an orthonormal basis and the land- 
scape can be expressed in terms of a decomposition, 
called Fourier expansion Um . which reads 



F{&) = Yj fl " 

p=0 h.Jp 



(5) 



See fig.[T]for the visualization of three eigenfunctions on 
a L = 3 hypercube. While the a,-, 's contain the informa- 
tion about the relative influence of the non-epistatic con- 
tributions on fitness, the higher order coefficients a,,..., p 
with p > 1 describe the relative strength of the contri- 
butions of /7-tupels of interacting loci. The zero order 
coefficient ao is proportional to the mean fitness of the 
landscape, 

creUl 

where the prefactor reflects the normalization of the 0,. 

The amplitude spectrum quantifies the relative contri- 
butions of the complete sets of p-tupels to the epistatic 
interactions. Following ref. ||32j, we consider random 
field models of fitness landscapes where individual in- 
stances of the ensemble (realizations) are constructed 
from random variables according to some specified rule 
(see sectsj3]and|4|i, and define amplitude spectra as av- 
erages over the realizations. Two kinds of averages ap- 
pear: averaging over realizations at a constant point in 
M 2 L , and spatially averaging over all points in M 2 L . Here 
and in the following angular brackets (...) denote av- 
eraging over the realizations of the landscape, while an 



3 



overbar denotes a spatial average over H?, as for exam- 
ple in 

F = 2~ L JV(cr). 

O" 

For the definition of the amplitude spectrum, again 
two types of averages need to be distinguished. The first 
one reads 
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for p > and Bo - 0. For an additive landscape B\ — 1 
and Z? sum = Yii>\ Bi = while for a landscape with epis- 
tasis B sum > 0. In |fl7ll B sum was used as a quantifier 
for the amount of epistasis found in empirical fitness 
landscapes. Note that the values of Z? sum for different 
landscapes are contrastable because of the normaliza- 
tion Z p >o b p = 1- 

Another way to define the amplitude spectrum is 
through 



B P = 



q*Q bq 



(7) 



with bp = (\ a h.J p \ 2 ) f° r a U P ^ !• The zero order 
coefficient br> is not defined in terms of the Fourier co- 
efficients a,, but is proportional to the mean covariance, 

b = 2~ L J] (8) 

o-,o-'eW 2 L 

as defmecjlin ll32ll . The main difference between the B p 
and the B p consists in whether averaging is performed 
separately on the terms in the fraction or on the fraction 
as a whole. As it is often easier to calculate a fraction of 
averages than an average of a fraction, the present work 
concentrates on the B p . While the B p are not generally 
normalized, Yj P >q B p ^ l,a normalized amplitude spec- 
trum can easily be constructed through 
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2.3. Relation to fitness correlations 

In ref. B32I1 it was shown that the differently averaged 
spectra are related to different types of fitness correla- 
tion functions. The direct correlation function is defined 
for all sequences of a given Hamming distance d as 
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(10) 



d(cr,cr')=d 



This correlation function is linked to the normalized am- 
plitude spectrum, B p , according to 



(Pd) = ^BpUJpid) 

p>0 



(11) 



where the a> p are orthogonal functions depending on the 
underlying graph structure 113 211 . On the other hand, the 
autocorrelation function Rj defined a^l 



R,i 



(F(cr)F(cr')} d - (F) 2 



(12) 



where (. . .)</ denotes a simultaneous average over all 
possible pairs (cr, cr') with d{cr, cr') — d as well as over 
the realizations of the landscape, is linked to the ampli- 
tude spectrum B p according to l35ll 



Rd = XI B p co P (d). 

p>0 



(13) 



Again, the difference between eq. ( TT3l l and eq. (fTTT i lies 
in how the averaging is performed. 

For the Boolean hypercube, the functions a> p (d) are 
closely related to the Krawtchouk polynomials K p (d) 



cjpid) = 



K p (d), 



where B361I37J] 



Unless stated otherwise, here and in the rest of the ar- 
ticle, binomial coefficients are understood to be defined 

as 

/A (hjeW. L>kandL,k>0, 
\k 1o, else. 



(15) 



Our primary interest is in the calculation of analytical 
expressions of the B p for known R^. Thus, an inversion 
of eq. ( fT3l is needed. This can be achieved by exploiting 
the orthogonality of the Krawtchouk polynomials with 
respect to the binomial distribution, which implies that 

(K p , K q ) = Yj QK p (d)K q (d) = 2 L Qs pq . (16) 



-Note that the prefactor of bo given in [ 32] appears to be incorrect. 



3 This is a slight variation of the autocorrelation function given in 
ref. 13211 . The original definition is restricted to landscape models ful- 
filling (F(cr)) = const., with a constant that is independent of cr. The 
proof of Theorem 5 in t32ll can be carried out analogously for the 
definition i\2l without suffering from this constraint. 
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Multiplying eq.([T3l by \~^K q (d) and summing over d 
thus yields 



■l 



B p Kp(d)K„(d) = 2 L B q , (17) 



d>0 p>0 

and we conclude that 



^^ZWJr (18) 

Now, the calculation of amplitude spectra from autocor- 
relation functions is possible and at least numerically 
any spectrum can be calculated from a given correlation 
function. But for some landscape models even exact an- 
alytical solutions can be obtained, as will be shown in 
the following sections. 

3. Kauffman's LK-model 
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Figure 2: The autocorrelation function (top) and the amplitude spec- 
trum (bottom) for the LK model with L = 100 and different values of 
k = K+l. 



The simplest random field model of a fitness land- 
scape is the House-of-Cards (HoC) model lf38ll39ll . In 



this model, the fitness values are assigned randomly to 
genotypes according to 



(19) 



where the £(cr) are independent and identically dis- 
tributed (i.i.d.) random variables drawn from some dis- 
tribution. Without loss of generality we assume that 
the £ have vanishing mean, {%) = and finite variance 
D - Var(£). The amplitude spectrum of the HoC model 
is known to be B q = 2~ L {^ [32], which also follows 
from eq. (1181 1. 

Although the HoC model has been widely used for 
the modeling of adaptation J22, 23, 25[]> there is by now 
substantial experimental evidence that the assumption 
of uncorrelated fitness values overestimates the rugged- 

l4Qh . It is there - 



ness of real fitness landscapes [17, 2C 
fore necessary to consider more complex models, which 
include fitness correlations in a biologically meaning- 
ful way. A prototypical model with tunable ruggedness 
is Kauffman's LK model 11291 |3(X kill , which assumes 
random epistatic interactions within groups of loci of 
fixed size and fixed membership. In the classical ver- 
sion, each locus ; interacts with a set of K other loci 
{o"i 1 ,...,cr !ir }, which together with the locus cr ; itself 
constitute the LK -neighborhood of locus i. 

To take into account more general setups, the con- 
straint of o-j being a member of the i-th neighborhood 
will be relaxed here [32]. Thus, defining k = K + 1, 
the i-th ZJf-neighborhood is the set {<x ;i , . . . , o-, t ). The 
fitness is assigned as follows: Let {/ } be L random func- 
tions with K + 1 = k binary arguments. For each of the 
2 k combinations of the arguments, the fiicr^, . . . , cr^) 
are chosen as i.i.d. random variables with variance D. 
The fitness landscape is then defined as 



,0-,,). 



(20) 



Thus, each /; is equivalent to a HoC landscape of size 
K + 1 = k. For K — L - 1 , respectively k — L, the land- 
scape is maximally rugged and reduces to the totally un- 
correlated HoC model, while for K — 0, respectively 
k = 1, all fitness contributions are independent, and the 
model is fully additive. By changing k the ruggedness 
of the fitness landscape can be tuned. 

To complete the definition of the model, it has to be 
specified how the elements of the neighborhoods are 
chosen. In the most commonly used versions of the 
model, the k interacting loci are either picked at random 
or taken to be adjacent along the sequence 129113(11 . A 
third possibility is to subdivide the sequence into blocks 
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of size k, such that within blocks every locus interacts 
with every other but blocks are mutually independent 
1 42J, |43[] . Although the construction of the neighbor- 
hoods affects certain properties of the landsca pes such 
as the number of local fitness maxima 1144 14511 and 
the evolutionary accessibility of the global maximum 
1 21, 4^], it turns out that the autocorrelation function 
does not depend on it. The autocorrelation function of 
the LK model can be calculated starting from eq. (fTZt 
and is given by [47] 



Rd 



L - k\lL 
d )\d 



(21) 



see fig. |2] Note that previously some incorrect expres- 
sions for the correlation functions have been reported in 
the literature ll48ll which led to the erroneous conclusion 
that the choice of the neighborhood affects the ampli- 
tude spectra [32]. 

Inserting (fJTJ into eq. < TT~8T > yields 



S 9 = 2" L 2W( 



L-k 
d 



(22) 



The evaluation of this expression is somewhat technical 
and can be found in Appendix A. The final result 



B q =2" 



(23) 



is remarkably simple (see fig.|2]for illustration). As ex- 
pected, the Fourier coefficients vanish for q > k 112 U 14911 
and the known case of the HoC model is reproduced for 
k — L. Moreover, the coefficients satisfy the symmetry 
B q = Bk-q and are maximal for q - k/2, as was previ- 
ously conjectured in y2fl . 

The LK model is already a very flexible model and 
offers many possibilities for tuning. An even more 
general model is obtained by considering superposi- 
tions of LK models, in the sense of LK'-fitness land- 
scapes being added independently. Let [F m ((f) = 
-4= 2; jj \o~h ' • ■ • ' ^Jifm) )) be a family of n LK fitness 
landscapes with neighborhood sizes k^, m = l,...,n. 
Then its superposition T is defined by 



T : o- k> J F m (cr) 



111= 1 
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m =l yL. - =1 



(24) 



Since the different LK landscapes {F m } are independent, 



the correlation functions are additive, 



y n d 

Zjj=o U J 



with statistical weights 



/=i 



(25) 



A,- J] 



D„ 



{m\k tm '=i 



where D m = Var(/ (m) ) and the sum is over all landscapes 
with neighborhoods of size i. The amplitude spectrum 
of the superposition is thus of the form 

i>0 » ' 

Note that the consistent interpretation of an empirical 
fitness landscape as a superposition of LK landscapes 
requires all A, to be positive. Nevertheless, it can be 
useful to consider superpositions containing negative A, 
to calculate amplitude spectra of fitness landscapes con- 
structed by different means (see section |4]for an exam- 
pie). 

Interestingly, expression (l26T l is also obtained from 
another type of generalized LK-model, giving rise to a 
different biological interpretation of the decomposition. 
Consider again fitness values F(o-) that are constructed 
as sums of fitnesses corresponding to HoC landscapes 
associated to LK-Mke neighborhoods ./Kc,, , . . . , 0-,-^), 



M 



(27) 



where M is an integer that can be different from L, and 
k < -' ) is the size of the i—th neighborhood, drawn from 
some distribution P(k). Furthermore, for simplicity as- 
sume that the variances D, of the fi are all the same. 
The reasoning behind this model is to retain the idea 
of interacting groups of loci that is inherent in the LK 
model, but to relax the rather unrealistic condition that 
all these groups are of the same size. Rather, it is as- 
sumed that there exist some typical distribution for the 
sizes of these groups. 



Following the procedure explained in [47], the cor- 
responding autocorrelation function is easily shown to 
be 



Yj p{k) 



k>i) 



L-k 



(28) 
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which trivially leads to expression (|26> with A* = P(k). 
The coefficients obtained from the decomposition of 
experimentally obtained spectra in terms of LK spec- 
tra could therefore also be interpreted as a probabil- 
ity distribution for the sizes of interacting neighbor- 
hoods. Again, this interpretation is only consistent if 
all weights are positive. Here, it seems reasonable to 
expect that for large enough landscapes P(k) should be- 
come continuous in the sense that the distribution be- 
comes monotonic over large contiguous parts of its sup- 
port. 



4. Rough Mount Fuji model 

Another model with tunable epistatic effects is the 
Rough Mount Fuji (RMF) model j3ltl . which is con- 
structed by superimposing a purely additive model and 
a HoC landscape according to 



F : o- h>/ + Yj fo i°"i + f( cr )- 



(29) 



In ref. 113 111 , fo and the b, were parameters to be deter- 
mined empirically from experimental data. Here we 
instead choose fo as some arbitrary constant, the b, 
as L i.i.d. random variables, and £(cr) as another set 
of 2 L i.i.d. random variables with <^(cr)) = and 
{^{cr)^{cr')) = Di6a- lT ', compare to the construction of 
the HoC model above in sect. [3] Note that, in contrast to 
the the bj do not depend on <x. The amount of rugged- 
ness is controlled by fixing the variance of the HoC 
component, Di, and the mean of the absolute values of 
the slopes of the additive model, s = Yn=\ \bi\/L. The 
important limiting cases, the HoC model and the purely 
additive model, are obtained in the limits Dl/s — > oo 
and Dl/s — > 0, respectively 11711 . 

In the following we write bi = | + where c is a 
constant independent of i, and the £ are i.i.d. random 
variables with (&) = and = £>i<5y. Note that 

choosing the same mean value for all the b,'s singles 
out the reference sequence cr (0) = (1, . . . , 1). On aver- 
age, the fitness of sequence <x decays linearly with the 
Hamming distance d(a-, <x (0) ) to the reference sequence 
<t (0) and the mean slope is c. Setting D\ — yields a 
simpler version of the RMF model that was introduced 

To calculate the autocorrelation function of the RMF 
model, it is convenient to rewrite the fitness as F(cf) = 
a - cd{a,o- ) + Zt\ + £(°")> where a = fo + 4f . 
Making use of the vanishing mean values of the £,'s and 



c=0.1 
c=0.5 
c=5 



10- 
10° 
10" 2 
lO" 4 

io~* 

10" 8 
lO" 10 
lO" 12 
lO" 14 
lO" 16 




10 15 20 25 30 35 



Figure 3: The autocorrelation function (top) and the amplitude spec- 
tram (bottom) for the RMF model with L = 100, D l = 0, D L = I and 
various values of c. 



the £'s, the autocorrelation function reads 

+ {(a - cd(cr, cr ))(a - cd(cr' , cr )))d 

\2* 



— (a — cd(cr, ctq)) 
^{a - cd(cr, <To)) 2 - (a - cd(cr, <x )) 

+ ((zH 2 ) + ^ )2 >) 

The covariance of the deterministic part has been evalu- 
ated in 1I26I1 and the terms containing random variables 
can easily be calculated, yielding 



(D 1 + £){L-2d)+D L 8 d0 

(d 1 + ^)l + d l 



In order to obtain the spectrum B p we write R^ MF as 
a linear combination of correlation functions of the LK 
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Figure 4: The amplitude spectrum B p (main) and the renormalized 
spectrum j} p (inset) for exponentially decaying fitness correlations. In 
the inset, the exponential decay is obvious. 



model with different k's, i.e. R™ F = J^ =0 A k { L ~ k )/{^ 
with expansion coefficients 

(Pi+j)*. 2(D l + £)L 

Aq = --, 77 , Ai 



(d 1 + %)l + d l (d 1 + ^)l + d l 



(30) 



(d 1 + c ~)l + D l 

= for all other k 
lated making use of the linearity of equation (1181 1. yield 



and Ak — for all other k's. The B p can now be calcu- 



mg 



nRMF 



(D l + ^)L6 pl+ D L 2-f p ) 
(Di + c j)l + D l 



(31) 



In fig. [3] autocorrelation functions and amplitude spec- 
tra for the RMF model with D\ = and various choices 
of c are shown. Note that the generality of the superpo- 
sition ansatz made it possible to calculate the B p for the 
RMF model, although the relation to the LK model is 
not obvious at first sight. Having in mind that the zeroth 
component does not contain information about epista- 
sis, we adopt, for the rest of this article, a more general 
definition of RMF landscapes as superpositions of LK 
landscapes with all components being equal to zero, ex- 
cept for A i > 0, Al > 0, and an arbitrary zeroth order 
coefficient Ao that may be of any sign. 

5. Exponentially decaying correlation functions 

The motivation for the present article is to identify 
typical features of amplitude spectra of fitness land- 
scapes and to make use of them for extracting infor- 
mation about the underlying biological system. In the 



preceding two sections we considered well-established 
statistical models of fitness landscapes and computed 
their spectra. As will be further illustrated in sect. |6] 
this analysis provides criteria to judge whether a mea- 
sured spectrum can be explained by these models or not 
and, if so, one can use the biological picture behind the 
model to try to interpret the findings. 

However, when faced with experimental data, none of 
the presented models may be general enough to give a 
good description. If this is the case, an alternative ansatz 
is to start with a presumably generic correlation function 
and calculate the corresponding spectrum, which can be 
compared to the data. This may then also guide the 
search for improved models. Here, we consider a corre- 
lation function that decays exponentially with Hamming 
distance d 

A d , (32) 



„exp 
R d 



with < A < 1 . The resulting expression for the spec- 
trum obtained from eq. ([T&t . 



B„ 



•El>v</>.i" 



(33) 



is most easily evaluated using the known form of the 
gen erating function of the Krawtchouk polynomials 

H0 



<K(x, Z ) = J] K„(x)z n = (1 - zY(l + zf~ x (34) 



iiMI 



and the fact that these polynomials are self-dual in the 
sense of [50] 



L Vn(x) = rk,(«). 



(35) 



Indeed, inserting d35l l into (l33l and using (l34l yields 

(36) 

Defining k = In ( j-^) this expression can be rewritten as 



B q = 2- L r|(l -X) q (\ +A) L - q 



-Kq 



(1 + e-*) 



(37) 



corresponding to iJ™ p = (l - j^r) ■ We conclude that 
if the spectrum normalized with respect to the number 
of <7~tuples, ]3 q = Bg/^j, decays exponentially with q, 
then the correlations decay exponentially with distance 
on the hypercube, see fig. |4] 

Although we are, at the moment, lacking simple 
stochastic models that produce exponentially decaying 
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correlations, spectra of the form (l37t have recently been 
found for fitness landscapes obtained from a dynamical 
model of molecular signal transduction [51]. It would 
be interesting to see whether one can construct stochas- 
tic models that do not enter too deeply into the dynam- 
ics at the cellular level but contain a simple and generic 
mechanism that gives rise to such correlations. 

Exponentially decaying correlations have also been 
reported in a recent large-scale study of the fitness land- 
scape of HIV- 1 ll52ll . However, the correlation function 
calculated in that article is different from the one stud- 
ied here, as it averages over correlations between fitness 
values of mutants that are connected by random walks 
of some length s and not over fitness values correspond- 
ing to states separated by Hamming distance d. Such 
random walk correlation functions are also connected 
in a simple manner to the amplitude spectra ll35ll . but 
the relation is different from the one considered here. 
Therefore our results are not directly applicable to these 
observations. 



6. Experimentally obtained fitness landscapes 

In this section we compare the model spectra to sev- 
eral experimentally measured "fitness" landscapes. The 
quotation marks indicate that not all of the landscapes 
presented here actually correspond to fitness, but rather 
to some proxy of it. To be able to compare spectra, 
the landscapes should be as large and as complete as 
possible. The four landscapes considered are a six lo- 
cus landscapes obtained by Hall et al. for the yeast 
Saccharomyces cerevisiae, an eight locus landscape for 
the fungus Aspergillus niger presented in Franke et al. 
lEoll . and two nine locus landscapes for the plant Nico- 
tiana tabacum given in O'Maille et al. [80. A compar- 
ative analysis of these (and other) empirical landscapes 
can be found in [ 17]. All spectra presented in this sec- 
tion were calculated directly by decomposing the fitness 
landscapes in terms of the eigenfunctions of the graph 
Laplacian. 

While the first two landscapes mentioned above mea- 
sure growth rate as a quantifier of fitness, the landscapes 
presented in Jit] measure enzymatic specificity of ter- 
pene synthases, that is, the relative production of 5-epi- 
aristolochene and premnaspirodiene, respectively. As 
for these landscapes only 418 out of 512 fitness values 
were measured, the missing data is estimated by fitting 
a multidimensional linear model ll53ll to the measured 
landscape. The fitness values of states for which there 
are no measurements are then replaced by the values 
given by the linear model. On the contrary, for the A. 



niger landscape considered in [20], missing fitness val- 
ues were argued to correspond to non-viable mutants 
and are therefore set to zero. The way of estimating 
missing values obviously affects the spectra, but some 
estimation is necessary to be able to carry out the anal- 
ysis. 

We now ask whether the experimental spectra can be 
expressed as superpositions of L/T-spectra of the form 
d26i i (recall that the RMF model is a particular case of 
such a superposition). Of course, such a decomposition 
is always possible, but the assumption that the biologi- 
cal mechanism responsible for the spectra is really the 
additive interplay of fixed groups of loci of characteris- 
tic sizes is only reasonable if all the coefficients Aj are 
positive. 

Simply solving the linear system of equations d26l l 
generally yields several negative coefficients. More sat- 
isfactory results are obtained by fitting a function of the 
form d26l i to the data by means of a least square pro- 
cedure, constraining the coefficients to positive values. 
Here, two ansatzes are considered. First a fit containing 
all coefficients is carried out, with none of the Aj fixed to 
zero a priori. This is done to check whether a superpo- 
sition of type ( l28l with a continuous neighborhood size 
distribution P(k) is appropriate. Second, sparse fits con- 
taining as few nonzero A 7 's as possible are carried out to 
verify if the landscape could be biologically interpreted 
as a superposition of a small number of LK landscapes 
of different interaction ranges. One way of selecting 
A/s that can be neglected in the fit is to identify those 
coefficients obtained in the full fit that are much smaller 
than the others. In all cases, the term proportional to 
Aq in J26b is not considered as it can always be trivially 
fixed to fit /Jo- 
in fig. [5] the data for the normalized amplitudes B* p 
(black dots) is shown together with the fit (green curve) 
and the HoC component ~ (^j (red dashed line) of the 
fit. For the A. niger landscape in 12011 error estimates for 
the fitness values were available 1540, enabling the cal- 
culation of error bars to the spectrum. This is done by 
constructing ensembles of landscapes with fitness val- 
ues F(cr) = (F(o~)) + £(cr), where (F(o~)) is the mean 
of the replicate experimental measurements of the fit- 
ness of genotype <x and the £(cr) are normally distributed 
random numbers with cr-dependent standard deviations 
obtained from the replicate measurements. Note that 
the influence of the measurement errors on the spectra 
is very small and only exceeds the symbol size for the 
highest p component (p = 8). At least for this case one 
can therefore safely exclude that the HoC component of 
the spectrum is generated by measurement errors. 



9 



10 ' 
10" 2 

s 

*«) io- J 





(a) 


Hallefa/. 2010 

• Data 
Fit 

■ HoC component (from Fit) 


• 




O'Maillc et at 2008 
Relative 5-cpi-aristolochenc output 
• Data 

Fit 

-HoC component (from Fit) 



3> 

*«5 



*0q 



e 


(b) 


Franke et al. 201 1 

• Data 
Fit 

■ HoC component (from Fit) 


I 


1 2 3 4 5 6 7 8 
P 


(d) 

'' * \ 


O'Maille et al 2008 

Relative prcmnaspirodicne output 

• Data 
Fit 

■ HoC component (from Fit) 


*\ 



123456789 
P 



Figure 5: Spectra corresponding to various experimentally measures fitness landscapes. The green lines are obtained by fitting the spectrum of a 
supeiposition of LK models to the data. The dashed red line is proportional to \zj, showing the spectrum expected for a HoC component. 



As can be seen in fig. [2 a), the spectrum of the yeast 
landscape [11] is nicely fitted by an ansatz where only 
Ai and Al are assumed to be different from zero. This is 
evidently a superposition of an additive and a HoC land- 
scape and therefore a RMF landscape. Only the value at 
p — L seems too small to be fitted by the model. How- 
ever, this value corresponds to a single component of 
the decomposition © and the large deviation may be 
due to the lack of averaging. Also for the A. niger land- 
scape from [20] a nice and sparse fit with nonzero coef- 
ficients Ai, Ai, and Al is obtained (see Fig.|5fb)). The 
significant value of A2 implies that there are important 
interactions between pairs of loci. A RMF landscape is 
therefore not an appropriate model of this system. Note 
that this conclusion differs from the analysis presented 
in 1I20I1 . where a reasonable fit to the RMF model was 
found for a particular epistasis measure, the number of 
accessible pathways. This illustrates the importance of 
using more than one topographic measure for the com- 
parison between empirical and model landscapes lll7ll . 

For the spectrum of the 5-epi-aristolochene N. tabac- 
cum landscape from [8], the fitting yields reasonable re- 
sults for an ansatz allowing only A 1 , A2, A§ and Al to be 
different from (see Fig.[5]c)). This might indicate that, 
apart from the non epistatic part and the simple pair in- 
teractions, there are one or several groups consisting of 



6 strongly interacting alleles. Using the same ansatz for 
the premnaspirodiene landscape yields less convincing 
results, as the large p part of the spectrum seems to be 
poorly fitted (see Fig. |3d)). Introducing more compo- 
nents into the fitting ansatz yields better results for this 
part of the spectrum, but such ansatzes can hardly be 
considered sparse anymore. 

Using the full ansatz to fit the different landscapes 
does not yield any qualitative improvement for the first 
three landscapes and provides no evidence for an under- 
lying continuous neighborhood size distribution P(k). 
Only for the premnaspirodiene N. tabaccum landscape 
does the fit for the spectrum improve notably, but the 
obtained spectrum does not support the idea of a con- 
tinuous distribution of neighborhood sizes (not shown). 
In general, such a continuous distribution is more likely 
to emerge for larger landscapes than the relatively small 
data sets considered here, which suffer from insufficient 
averageing over groups of loci of different sizes. 

One should be aware that failing to obtain a reason- 
able decomposition of an empirical landscape in terms 
of LK spectra does not a priori rule out the possibility 
that the landscape is in fact shaped by the mechanisms 
assumed by a superposition of ZJT-models. For exam- 
ple, the failure may be due to an inappropriate fitness 
measure, in the following sense. Suppose that there 
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exists a fitness proxy, F'(cr), whose decomposition in 
terms of LK landscapes is sparse, but the proxy actually 
measured in experiments is F — G(F'), with G being 
some nonlinear function. The decomposition of F may 
then not be sparse anymore and the biological mecha- 
nism that shapes the landscape may be obscured. 

Finally, it was checked whether any of the spectra are 
compatible with the expression dJTl i corresponding to an 
exponentially decaying correlation function, but no rea- 
sonable correspondence was found. Of course, this does 
not allow for the conclusion that exponentially decaying 
correlations are an unrealistic assumption. Possibly, it 
may again be necessary to go to larger landscape sizes 
to see such behavior. Also, the way in which the mu- 
tations constituting the landscape are selected may have 
an influence on the observed correlations (see e.g. IU7I0 . 



7. Conclusions 

Exploiting the connection between amplitude spec- 
tra and fitness autocorrelation functions of fitness land- 
scapes over the Boolean hypercube, the amplitude spec- 
trum of Kauffman's LK model was calculated exactly 
and found to be of the simple form (l23l l. By superimpos- 
ing LK landscapes also the spectra of RMF-type mod- 
els could be obtained. In addition, an LK-like model 
with a distribution P(k) of neighborhood sizes was in- 
troduced and its spectrum was calculated. Such an ex- 
tension of the ZJf-model is reasonable, because it can- 
not be assumed in general that every locus interacts with 
the same number of other loci. This model thus offers 
more flexibility to fit experimental data. As a last exam- 
ple, the spectrum of a model with exponentially decay- 
ing correlations was computed. 

The HoC, RMF and LK models are frequently used 
for analyzing evolutionary processes, classifying fit- 
ness landscape properties and fitting experimental data. 
Therefore a lot of effort has been invested in the under- 
standing of these models, but the link to experimental 
data is still rather weak. The amplitude spectra calcu- 
lated in this article should facilitate quantitative com- 
parisons in future studies. The spectra contain a large 
amount of information about the landscape topography, 
and it is important to understand how the spectrum en- 
crypts this information in order to be able to interpret 
the spectra of measured fitness landscapes. As an exem- 
plary application of our results, four experimental land- 
scapes were fitted by means of the model spectra. Three 
of them could be fitted very nicely with sparse super- 
positions of LK models, while for the fourth one the 
obtained fit seems less convincing. In none of the cases 



evidence for a continuous neighborhood size distribu- 
tion P(k) was found, which might be due to the small 
sizes of the landscapes discussed in this article. 

We claim that the fitting of amplitude spectra can 
be a useful tool for data analysis, but it has to be em- 
phasized that the spectra cannot be assigned to model 
landscapes in a unique way. Also, the collection of 
models presented here is by no means exhaustive. Ob- 
taining analytical expressions for the amplitude spectra 
of other classes of fitness landscapes is desirable and 
should prove helpful in guiding the search for suitable 
models of experimental landscapes. 

Finally, it is important to mention that there are in- 
teresting and biologically relevant properties of fitness 
landscapes that cannot be obtained from their spectra, 
such as, for example, the number of local fitness max- 
ima and the number of selectively accessible pathways 
JfjlllcJ. While it was shown in ref. [17] that the mgged- 
ness measure B sma based on the Fourier decomposition 
correlates with both quantities, there is no strict corre- 
spondence between these measures of epistatic interac- 
tions. Amplitude spectra do not distinguish between dif- 
ferent kinds of epistasis, i.e. magnitude, sign, orrecipro- 
cal sign epistasis, in a qualitative way. Therefore, if one 
is interested in this distinction, other epistasis measures 
have to be included in the analysis. 
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Appendix A. Fourier spectrum of the LX-model 

To evaluate the expression ( 1221 . an alternative but 
equivalent formulation for the Krawtchouk polynomials 
is needed. With Q 



d\ L - i 



K q (d) = X (_2) ' 



/>(> 



i j\q - i 



we obtain 



d>0 \ u I 



/>() d>0 



d\(L — i\(L — k 



ij\q — i)\ d 
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The summation over d can be carried out using the iden- 
tity JH 



I 



: 2 



which yields 

At this point we relax the condition (fT~5T > of positivity on 
the entries of the binomial coefficients. This allows us 
to perform an 'upper negation' [56] in the first binomial 
factors in eq. dA.U . 



q-ij \ q-i 



The remaining sum over i can now be evaluated using 
the Vandermonde identity ]56J], 



= 2-*(-l) ? l 



q 



and with another upper negation we arrive at the final 
result 423). 
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